Skip to content

refactor(mtp): MtpSource enum + auto-detect MTP tensors#2

Merged
dusterbloom merged 1 commit into
feat/dflash-mtp-foundationfrom
refactor/mtp-source-enum
May 21, 2026
Merged

refactor(mtp): MtpSource enum + auto-detect MTP tensors#2
dusterbloom merged 1 commit into
feat/dflash-mtp-foundationfrom
refactor/mtp-source-enum

Conversation

@dusterbloom
Copy link
Copy Markdown
Owner

Summary

Stacked on Luce-Org#237 (Luce-Org/lucebox-hub). Addresses @howard0su's review comments on dflash/src/common/backend_factory.h:

  • Line 57 ("shall we use the unsloth single-file MTP-in-target GGUF?"): --mtp-gguf is now optional. When absent, gguf_contains_mtp_tensors() probes the primary GGUF for qwen35.nextn_predict_layers > 0 and selects MtpSource::Native automatically.
  • Line 59 ("why not a enum?"): enum class MtpSource { None, Native, ExternalDrafter, Auto } replaces the string-keyed mtp_draft_source and the load-shape-implied-by-pointer mtp_gguf_path.

Changes

// Before
const char * mtp_gguf_path    = nullptr;      // sentinel for shape
const char * mtp_draft_source = nullptr;      // "chain" | "mtp_topk"

// After
enum class MtpSource { None, Native, ExternalDrafter, Auto };
MtpSource    mtp_source       = MtpSource::None;
const char * mtp_gguf_path    = nullptr;      // only for ExternalDrafter
bool         mtp_use_topk     = false;        // false=chain, true=mtp_topk

dflash_server CLI

  • --mtp-source [none|native|external|auto] — new explicit flag
  • --mtp-gguf PATH — now optional; only required for ExternalDrafter mode
  • --mtp-gamma N alone (no --mtp-source) triggers Auto detection
  • Old --mtp-draft-source flag emits a stderr warning and is ignored (migration aid)

Auto-detect

gguf_contains_mtp_tensors(path) opens the GGUF, checks for qwen35.nextn_predict_layers > 0 — the same field qwen35_mtp_loader.cpp requires. Pure metadata scan, no tensor allocation, no GPU touch.

Backward compat

  • --mtp-gguf PATH without --mtp-source → inferred as ExternalDrafter (old behavior preserved)
  • --mtp-gamma N without --mtp-source and without --mtp-ggufAuto (probes primary GGUF)

Tests

test_common_mtp_orchestrator (20-test mock-based suite) all pass. Build clean.

Note on tensor naming

gguf_contains_mtp_tensors() uses the qwen35.nextn_predict_layers metadata key as the detection heuristic — same criterion as qwen35_mtp_loader.cpp (which fails early with "not an MTP variant" if this key is absent). This covers the unsloth single-file format. If a different MTP GGUF format emerges without this key, howard can advise and we'll add a tensor-name scan as a fallback.

Per @howard0su's review on Luce-Org#237 (lines 57, 59):
- 57: 'shall we use the unsloth single-file MTP-in-target GGUF?'
- 59: 'why not a enum?'

Replaces:
  const char * mtp_gguf_path = nullptr;
  const char * mtp_draft_source = nullptr;  // "chain" | "mtp_topk"
with:
  enum class MtpSource { None, Native, ExternalDrafter, Auto };
  MtpSource    mtp_source     = MtpSource::None;
  const char * mtp_gguf_path  = nullptr;  // only for ExternalDrafter
  bool         mtp_use_topk   = false;    // false=chain, true=mtp_topk

Adds gguf_contains_mtp_tensors() probe (keyed on qwen35.nextn_predict_layers
metadata) so --mtp-gguf becomes optional when the primary GGUF embeds
MTP tensors (unsloth single-file case).

Stacked on Luce-Org#237. dflash_server arg parsing updated to:
- --mtp-source [none|native|external|auto] (new explicit flag)
- --mtp-gguf PATH (now optional; only needed for ExternalDrafter)
- Old --mtp-draft-source string flag warns + ignored (migration aid)
- --mtp-gamma alone triggers Auto detection

All test_common_mtp_orchestrator tests still pass (mock-based,
unaffected by the config-surface change).
@dusterbloom dusterbloom merged commit 388098c into feat/dflash-mtp-foundation May 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant